According to different storage methods of child suffix tree , we introduce several searching technology 根據(jù)不同的子后綴樹存儲方法,本文介紹了多種搜索方法。
This index is based on suffix tree , and is divided into some layers . a layer usually includes some child suffix trees 分層索引結(jié)構(gòu)是基于后綴樹的,它由若干層組成,通常情況下,每一層又包括若干棵子后綴樹。
The construction algorithm embodies three parts that are suffix sorting , logest common prefix evaluation process and child suffix tree construction 整個建立算法包括三個部分:后綴排序,計算最短公共前綴和建立子后綴樹。
The second part includes the index of child suffix tree , and some other information for finding the location of every child suffix tree in disk quickly 第二部分是索引部分,包括子后綴樹的索引和其他信息。利用這部分信息,可以快速定位每棵子后綴樹在磁盤中的位置。
The storage of index is divided into two parts . the first part is the information of each child suffix tree . child suffix tree can be represented by several methods which require different storage space 在存儲策略上,本文把整個索引劃分為兩部分:第一部分保存子后綴樹的信息,子后綴樹可以采用多種不同的優(yōu)化方法存儲,每種方法所需的存儲空間和性能均不相同。
The sequences are transformed to a relative sequence , in which all the segments are normalized by the previous segment . the relative sequences are categorized and indexed by suffix tree , and the result of the suffix tree search is the potential similar subsequences 通過將序列變換為相對序列,實現(xiàn)了對序列中任意位置、任意長度子序列的規(guī)范化;為了提高查詢效率,將相對序列的特征向量進(jìn)行離散化分類,并使用后綴樹進(jìn)行索引。
Suffix tree is a good index structure for smaller sequences , but it not suit large sequences , due to the so - called “ memory bottleneck ” . the suffix array is the closest competing structure , as it needs less space than a suffix tree . however , it is not convenient for searching 對于較小的序列來說,后綴樹索引無疑是一種很好的解決辦法,但由于它產(chǎn)生了“內(nèi)存瓶頸” ,不適合大的序列;后綴數(shù)組是另一種最具有競爭力的索引結(jié)構(gòu),與后綴樹相比,它需更少的存儲空間,但在數(shù)據(jù)搜索方面卻效率較低;基于q - gram和q - sample的索引方法雖然能用于快速搜索,但是不能用于搜索相似度低的數(shù)據(jù)。